9 research outputs found

    De uitvoering van gepartitioneerde programma's op multiprocessorsystemen

    No full text

    A VLSI processor switch for a dual IEEE-796 bus with shared and dual-port memories

    No full text
    The authors have developed a VLSI switch which controls and arbitrates the signals of a double multibus (IEEE-796 standard) system. There is one switch per processor, which is placed between the processor and the buses. Besides the pure shared-memory accesses, the switch also recognizes requests for another processor's dual-port memory and routes the data transfer accordingly. The chip was design with Silver Lisco CAD software and simulated using HILO-3 development tools. A detailed simulation of the standard cells was possible using a library of gate-level models with the appropriate timing characteristics. Functional models of the processor and the memories, together with the gate-level switch model, provided a complete dynamic simulation of the multiprocessor system. The simulation allows a full check of the VLSI component, including a functional description of processor and memories, before the manufacturing proces

    Implementation of an automatic program partitioner on a homogeneous multiprocessor

    No full text
    The design and the first results of a prototype multiprocessor featuring the automatic partitioning of Fortran programs are presented. The conversion of the source program to a task schedule is based on a dataflow analysis that takes into account the task size, the number of processors, and the communication between tasks. Various levels of task granularity can be selected, which allows a tradeoff between the amount of parallelism and the communication overhead. The architecture consists of up to 20 off-the-shelf processor boards, configured around the IEEE-796 bus. Each board has a dual port memory, used for semaphore storage and distributed synchronization. Experiments using the automatic partitioner for five programs show a near-linear speedup, even for small problems, and a high utilization of the floating-point processor when the tasks size is large

    Performance measurement and analysis of a shared-memory multiprocessor

    No full text
    The performance analysis of the VPS (Virtual Processor System) multiprocessor is presented. The system incorporates four to twenty 8-MHz 8086/8087 processor boards and is built with two main objectives in mind: speed and transparent operation. The von Neumann shared bus bottleneck is alleviated by executing the code in private memories and distributing the synchronization using dual-port memories. Furthermore the system is easily programmable using the LEM analyzer, an automatic program partitioner especially developed for the VPS. Both hardware and software performance measures were taken, using detailed minimal-intrusive timing probes and emulation traces. The test covered six programs, including linear system solving, sorting, matrix multiplication and Fast Fourier transformation. Apart from the raw speedup figures, four sources of overhead were analyzed: the processor idle time, the synchronization overhead, the bus delay and the task-switch time. The results show that the task granularity has the largest impact on the performance
    corecore